Project Team Alpage

Contracts and Grants with Industry

Project Team Alpage

Contracts and Grants with Industry

Section: Partnerships and Cooperations

National Initiatives

ANR project Sequoia (2009 – 2011)

Participants : Benoît Sagot, Pierre Boullier, Marie Candito, Benoît Crabbé, Pascal Denis, Éric Villemonte de La Clergerie, Djamé Seddah.

Alpage plays a major role in the ANR-funded project Sequoia , lead by Alexis Nasr (LIF, University of Marseille-Provence, former member of the Talana team at University Paris 7). This project aims at developing or adapting probabilistic parsing techniques in order to release a high-performance parser for French based on Syntax . It brings together specialists of NLP and specialists of Machine Learning, in a very fruitful way.

ANR project EDyLex (2010 – 2012)

Participants : Benoît Sagot [principal investigator] , Rosa Stern, Laurence Danlos, Pascal Denis.

EDyLex is an ANR project (STIC/CONTINT) headed by Benoît Sagot. The focus of the project is the dynamic acquisition of new entries in existing lexical resources that are used in syntactic and semantic parsing systems: how to detect and qualify an unknown word or a new named entity in a text? How to associate it with phonetic, morphosyntactic, syntactic, semantic properties and information? Various complementary techniques will be explored and crossed (probabilistic and symbolic, corpus-based and rule-based...). Their application to the contents produced by the AFP news agency (Agence France-Presse) constitutes a context that is representative for the problems of incompleteness and lexical creativity: indexing, creation and maintainance of ontologies (location and person names, topics), both necessary for handling and organizing a massive information flow (over 4,000 news wires per day).

The participants of the project, besides Alpage, are the LIF (Université de Méditerranée), the LIMSI (CNRS team), two small companies, Syllabs and Vecsys Research, and the AFP.

“Investissements d'Avenir” project PACTE (2012 – 2014)

Participants : Benoît Sagot, Éric Villemonte de La Clergerie, Laurence Danlos.

PACTE (Projet d'Amélioration de la Capture TExtuelle) is an “Investissements d'Avenir” project sumbitted within the call “Technologies de numérisation et de valorisation des contenus culturels, scientifiques et éducatifs”. It has been accepted, and will start in early 2012

PACTE aims at improving the performace of textual capture processes (OCR, manual script recognition, manual capture, direct typing), using NLP tools relying on both statistical (n-gram-based, with scalability issues) and hybrid techniques (involving lexical knowledge and POS-tagging models). It addresses specifically the applicative domain of written heritage. The project takes place in a multilingual context, and therefore aims at developing as language-independant techniques as possible.

PACTE involves 3 companies (DIADEIS, main partner, as well as A2IA and Isako) as well as Alpage and the LIUM (University of Le Mans). It brings together business specialists, large-scale corpora, lexical resources, as well as the scientific and technical expertise required.